Emotion and mental state recognition from speech

نویسندگان

  • Julien Epps
  • Roddy Cowie
  • Shrikanth S. Narayanan
  • Björn W. Schuller
  • Jianhua Tao
چکیده

As research in speech processing has matured, attention has gradually shifted from linguistic-related applications such as speech recognition towards paralinguistic speech processing problems, in particular the recognition of speaker identity, language, emotion, gender, and age. Determination of a speaker’s emotion or mental state is a particularly challenging problem, in view of the significant variability in its expression posed by linguistic, contextual, and speaker-specific characteristics within speech. In response, a range of signal processing and pattern recognition methods have been developed in recent years. Recognition of emotion and mental state from speech is a fundamentally multidisciplinary field, comprising contributions from psychology, speech science, linguistics, (cooccurring) nonverbal communication, machine learning, artificial intelligence and signal processing, among others. Some of the key research problems addressed to date include isolating sources of emotion-specific information in the speech signal, extracting suitable features, forming reduced-dimension feature sets, developing machine learning methods applicable to the task, reducing feature variability due to speaker and linguistic content, comparing and evaluating diverse methods, robustness, and constructing suitable databases. Studies examining the relationships between the psychological basis of emotion, the effect of emotion on speech production, and the measurable differences in the speech signal due to emotion have helped to shed light on these problems; however, substantial research is still required. Taking a broader view of emotion as a mental state, signal processing researchers have also explored the possibilities of automatically detecting other types of mental state which share some characteristics with emotion, for example stress, depression, cognitive load, and ‘cognitive epistemic’ states such as interest, scepticism, etc. The recent interest in emotion recognition research has seen applications in call centre analytics, human-machine and humanrobot interfaces, multimedia retrieval, surveillance tasks, behavioural health informatics, and improved speech recognition. This special issue comprises nine articles covering a range of topics in signal processing methods for vocal source and acoustic feature extraction, robustness issues, novel applications of pattern recognition techniques, methods for detecting mental states and recognition of non-prototypical spontaneous and naturalistic emotion in speech. These articles were accepted following peer review, and each submission was handled by an editor who was independent from all authors listed in that manuscript. Herein, we briefly introduce the articles comprising this special issue. Trevino, Quatieri and Malyska bring a new level of sophistication to an old problem, detecting signs of depressive disorders in speech. Their measures of depression come from standard psychiatric instruments, Quick Inventory of Depressive Symptomatology and Hamilton Depression rating scales. These are linked to measures of speech timing that are much richer than the traditional global measures of speech rate. Results indicate that different speech sounds and sound types behave differently in depression, and may relate to different aspects of depression. Caponetti, Buscicchio and Castellano propose the use of a more detailed auditory model than that embodied in the widely employed mel frequency cepstral coefficients, for extracting detailed spectral features during emotion recognition. Working from the Lyon cochlear model, the authors demonstrate improvements on a five-class problem from the speech under simulated and actual stress database. Their study also further validates the applicability of long short-term memory recurrent neural networks for classification in emotion and mental state recognition problems. Callejas, Griol and López-Cózar propose a mental state prediction approach that considers both speaker * Correspondence: [email protected] School of Electrical Engineering and Telecommunications, The University of New South Wales, Sydney, NSW 2052, Australia Full list of author information is available at the end of the article Epps et al. EURASIP Journal on Advances in Signal Processing 2012, 2012:15 http://asp.eurasipjournals.com/content/2012/1/15

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Database for Automatic Persian Speech Emotion Recognition: Collection, Processing and Evaluation

Abstract   Recent developments in robotics automation have motivated researchers to improve the efficiency of interactive systems by making a natural man-machine interaction. Since speech is the most popular method of communication, recognizing human emotions from speech signal becomes a challenging research topic known as Speech Emotion Recognition (SER). In this study, we propose a Persian em...

متن کامل

Speech Emotion Recognition Based on Power Normalized Cepstral Coefficients in Noisy Conditions

Automatic recognition of speech emotional states in noisy conditions has become an important research topic in the emotional speech recognition area, in recent years. This paper considers the recognition of emotional states via speech in real environments. For this task, we employ the power normalized cepstral coefficients (PNCC) in a speech emotion recognition system. We investigate its perfor...

متن کامل

Classification of emotional speech using spectral pattern features

Speech Emotion Recognition (SER) is a new and challenging research area with a wide range of applications in man-machine interactions. The aim of a SER system is to recognize human emotion by analyzing the acoustics of speech sound. In this study, we propose Spectral Pattern features (SPs) and Harmonic Energy features (HEs) for emotion recognition. These features extracted from the spectrogram ...

متن کامل

Statistical Variation Analysis of Formant and Pitch Frequencies in Anger and Happiness Emotional Sentences in Farsi Language

Setup of an emotion recognition or emotional speech recognition system is directly related to how emotion changes the speech features. In this research, the influence of emotion on the anger and happiness was evaluated and the results were compared with the neutral speech. So the pitch frequency and the first three formant frequencies were used. The experimental results showed that there are lo...

متن کامل

Speech Emotion Recognition Using Scalogram Based Deep Structure

Speech Emotion Recognition (SER) is an important part of speech-based Human-Computer Interface (HCI) applications. Previous SER methods rely on the extraction of features and training an appropriate classifier. However, most of those features can be affected by emotionally irrelevant factors such as gender, speaking styles and environment. Here, an SER method has been proposed based on a concat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • EURASIP J. Adv. Sig. Proc.

دوره 2012  شماره 

صفحات  -

تاریخ انتشار 2012